{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "provenance": [],
   "collapsed_sections": [
    "uijiWgkuh1jB",
    "MRiOtrywfAo1",
    "_gDkU-j-fCmZ",
    "3Zpv4S0-fDBv",
    "Dr49PotrfG01"
   ]
  },
  "kernelspec": {
   "name": "pycharm-3ea5732f",
   "language": "python",
   "display_name": "PyCharm (ai_quant_trade)"
  },
  "language_info": {
   "name": "python"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "source": [],
    "metadata": {
     "collapsed": false
    }
   }
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# 1. 安装依赖包，导入头文件\n",
    "\n",
    "本样例复现论文：*Practical Deep Reinforcement Learning Approach for Stock Trading （https://arxiv.org/abs/1811.07522）*.\n",
    "\n",
    "代码参考：[FinRL-Tutorials]((https://github.com/AI4Finance-Foundation/FinRL-Tutorials))"
   ],
   "metadata": {
    "id": "gT-zXutMgqOS"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "原理介绍：\n",
    "\n",
    "强化学习核心部分包括“机器人”和“环境”。流程大致如下：\n",
    "- 机器人和环境进行交互，观察到当前的条件，称为“状态”（**state**），并且可以执行“动作”（**action**）\n",
    "- 机器人执行动作后，会进入一个新的状态，同时，环境给机器人一个反馈，叫奖励（**reward**）\n",
    "  (通过数字反馈新状态的好坏)\n",
    "- 之后，机器人和环境不停的重复交互，机器人要尽可能多的获取累计奖励\n",
    "\n",
    "强化学习是一种方法，让机器人学会提升表现，并达成目标。"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "实现介绍\n",
    "\n",
    "使用OpenAI gym的格式构建股票交易的环境。\n",
    "\n",
    "state-action-reward的含义如下：\n",
    "\n",
    "- **State s**: 状态空间表示机器人对环境的感知。就像人工交易员分析各种信息和数据。机器人从历史数据\n",
    "  观察交易价格以及技术指标。通过和环境交互进行学习（一般通过回放历史数据）\n",
    "  \n",
    "- **Action a**: 动作空间代码机器人在每个状态可以执行的动作。例如，a ∈ {−1, 0, 1}, −1, 0, 1代表\n",
    "  卖出、持仓、买入。当处理多支股票时，a ∈{−k, ..., −1, 0, 1, ..., k}, 比如，“买10股AAPL”或者\n",
    "  “卖出10股AAPL”即10或-10。\n",
    "\n",
    "- **Reward function r(s, a, s′)**: 奖励用于激励机器人学习一个更好的策略。例如，在状态s下执行动作a\n",
    "  以改变投资组合值，并到达一个新的状态s', 例如，r(s, a, s′) = v′ − v, v′ 和 v 代表状态分别在s′ \n",
    "  和s时的投资组合总市值。\n",
    "  \n",
    "- **Market environment**: 道琼斯工业平均指数（DJIA）中30只成分股，包含回测时间段的所有交易数据。"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "source": [
    "# 如果没有安装，解注释进行安装\n",
    "# !pip install -r requirements.txt\n",
    "\n",
    "## 请把下面解注释，安装finrl库\n",
    "# 或者把如下Github仓中的finrl文件夹考到根目录即可使用\n",
    "##!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git\n",
    "\n",
    "# 强化学习库，使用stable_baselines3\n",
    "# 注意：\n",
    "#    1. 强化学习比起机器学习慢很多，CPU训练大约2分钟，强化学习使用单卡GPU大约需要30分钟左右完成训练\n",
    "#    2. 强化学习不稳定，每次收敛的loss不一样，且效果可能差异大     "
   ],
   "metadata": {
    "id": "D0vEcPxSJ8hI",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 1,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "import pandas as pd\n",
    "\n",
    "from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split\n",
    "from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv\n",
    "from finrl.agents.stablebaselines3.models import DRLAgent\n",
    "from stable_baselines3.common.logger import configure\n",
    "from finrl.meta.data_processor import DataProcessor"
   ],
   "metadata": {
    "id": "xt1317y2ixSS",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 2,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "from finrl import config  # 包含各类超参\n",
    "from finrl import config_tickers  # 常见各类市场股票代码集合，比如中证300\n",
    "import os\n",
    "from finrl.main import check_and_make_directories\n",
    "from finrl.config import (\n",
    "    DATA_SAVE_DIR,\n",
    "    TRAINED_MODEL_DIR,\n",
    "    TENSORBOARD_LOG_DIR,\n",
    "    RESULTS_DIR,\n",
    "    INDICATORS,\n",
    "    TRAIN_START_DATE,\n",
    "    TRAIN_END_DATE,\n",
    "    TEST_START_DATE,\n",
    "    TEST_END_DATE,\n",
    "    TRADE_START_DATE,\n",
    "    TRADE_END_DATE,\n",
    ")\n",
    "# 创建目录\n",
    "check_and_make_directories([TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])"
   ],
   "metadata": {
    "id": "wZ7Bl7i6I2AM",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 3,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 2. 在OpenAI Gym-style构建市场环境"
   ],
   "metadata": {
    "id": "aWrSrQv3i0Ng"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "# 读取训练数据，所有股票均混在了一个csv表里，格式如下\n",
    "# 索引     日期          股票\n",
    "#  0      2009-01-02    苹果\n",
    "#  0      2009-01-02    亚马逊\n",
    "#  1      2009-01-05    苹果\n",
    "#  1      2009-01-05    亚马逊\n",
    "\n",
    "# 注意：必须保持上述该格式，同样的索引下至少有2个数据，否则会报错，\n",
    "# 原因：\n",
    "#   1. 在finrl/meta/env_stock_trading/env_stocktrading.py的\n",
    "#      _initiate_state函数中self.data.close.values.tolist()，\n",
    "#      在404行，要求self.data.close必须是二维数组\n",
    "#   2. 而finrl/meta/env_stock_trading/env_stocktrading.py的\n",
    "#      __init__的64行self.data = self.df.loc[self.day, :]，\n",
    "#      如果索引顺序排，0，1，2。。。，会导致只取到一个行数，一维\n",
    "#      数据传入导致第1点中所述的问题\n",
    "#      （因此，如果只有一支股票时，需要把索引全部改成一样的，当然\n",
    "#      这种情况几乎不存在，也可以暂时忽略）\n",
    "# 解决方法：\n",
    "# 1. 降低numpy版本\n",
    "# 2. 把数据改成二维的，即（10，）-》（1，10） （改完是否存在回测不完整性，没有详细验证）\n",
    "# 3. 保持最上方所示的数据格式（推荐）\n",
    "\n",
    "train = pd.read_csv(os.path.join(DATA_SAVE_DIR, 'train_data.csv'))\n",
    "train = train.set_index(train.columns[0]) # 第一列为索引\n",
    "train.index.names = ['']\n",
    "assert train.shape[0] > 1, '数据必须至少包含2行，即2天以上'"
   ],
   "metadata": {
    "id": "mFCP1YEhi6oi",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 4,
   "outputs": []
  },
  {
   "cell_type": "code",
   "source": [
    "# 默认定义了8个技术因子\n",
    "INDICATORS"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "pwk32SeKJGWZ",
    "outputId": "258b8796-6fd7-445e-eb4a-d964597d248b",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 5,
   "outputs": [
    {
     "data": {
      "text/plain": "['macd',\n 'boll_ub',\n 'boll_lb',\n 'rsi_30',\n 'cci_30',\n 'dx_30',\n 'close_30_sma',\n 'close_60_sma']"
     },
     "metadata": {},
     "output_type": "execute_result",
     "execution_count": 5
    }
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "# 共29支股票，状态空间291\n",
    "stock_dimension = len(train.tic.unique())\n",
    "# 状态说明\n",
    "# 1：账户余额\n",
    "# [1: stock_dimension+1]: 股票价格\n",
    "# [stock_dimension+1: 1 + 2*stock_dimension]: 持仓数量\n",
    "# len(INDICATORS)*stock_dimension：每支股票的因子状态，bool型表示\n",
    "state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension\n",
    "print(f\"Stock Dimension: {stock_dimension}, State Space: {state_space}\")"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "7T3DZPoaIm8k",
    "outputId": "1455279f-d280-4a4f-a555-935fadd2bdb7",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 6,
   "outputs": [
    {
     "name": "stdout",
     "text": [
      "Stock Dimension: 29, State Space: 291\n"
     ],
     "output_type": "stream"
    }
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "buy_cost_list = sell_cost_list = [0.001] * stock_dimension  # 手续费\n",
    "num_stock_shares = [0] * stock_dimension\n",
    "\n",
    "env_kwargs = {\n",
    "    \"hmax\": 100,\n",
    "    \"initial_amount\": 1000000,\n",
    "    \"num_stock_shares\": num_stock_shares,\n",
    "    \"buy_cost_pct\": buy_cost_list,\n",
    "    \"sell_cost_pct\": sell_cost_list,\n",
    "    \"state_space\": state_space,\n",
    "    \"stock_dim\": stock_dimension,\n",
    "    \"tech_indicator_list\": INDICATORS,\n",
    "    \"action_space\": stock_dimension,\n",
    "    \"reward_scaling\": 1e-4\n",
    "}\n",
    "\n",
    "e_train_gym = StockTradingEnv(df = train, **env_kwargs)"
   ],
   "metadata": {
    "id": "WsOLoeNcJF8Q",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 7,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 2.1 构建训练环境"
   ],
   "metadata": {
    "id": "7We-q73jjaFQ"
   }
  },
  {
   "cell_type": "code",
   "source": [
    "env_train, _ = e_train_gym.get_sb_env()\n",
    "print(type(env_train))"
   ],
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "aS-SHiGRJK-4",
    "outputId": "35605c17-bdda-4f30-86bd-6db9b80c1f1e",
    "pycharm": {
     "is_executing": false
    }
   },
   "execution_count": 8,
   "outputs": [
    {
     "name": "stdout",
     "text": [
      "<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>\n"
     ],
     "output_type": "stream"
    }
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HMNR5nHjh1iz"
   },
   "source": [
    "# 3. 训练深度强化学习模型\n",
    "* 强化学习库：使用 **Stable Baselines 3**. 也可以尝试更换 **ElegantRL** and **Ray RLlib**.\n",
    "* FinRL库包含精调的标准深度强化学习算法, 包括DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "364PsqckttcQ",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "# 选择需要使用的强化学习算法\n",
    "\n",
    "# 5种算法：A2C, DDPG, PPO, TD3, SAC\n",
    "if_using_a2c = True\n",
    "if_using_ddpg = False\n",
    "if_using_ppo = False\n",
    "if_using_td3 = False\n",
    "if_using_sac = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uijiWgkuh1jB"
   },
   "source": [
    "## Agent 1: A2C\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "GUCnkn-HIbmj",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "outputId": "af2f8c1e-a300-4dc4-fe3d-84861bb4606d",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "text": [
      "{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}\n",
      "Using cuda device\n",
      "Logging to results/a2c\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 56         |\n",
      "|    iterations         | 100        |\n",
      "|    time_elapsed       | 8          |\n",
      "|    total_timesteps    | 500        |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.2      |\n",
      "|    explained_variance | -0.748     |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 99         |\n",
      "|    policy_loss        | -38.2      |\n",
      "|    reward             | -0.2910683 |\n",
      "|    std                | 1          |\n",
      "|    value_loss         | 2.75       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 69         |\n",
      "|    iterations         | 200        |\n",
      "|    time_elapsed       | 14         |\n",
      "|    total_timesteps    | 1000       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | -1.19e-07  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 199        |\n",
      "|    policy_loss        | -52.4      |\n",
      "|    reward             | -1.7289275 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 1.72       |\n",
      "--------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 73       |\n",
      "|    iterations         | 300      |\n",
      "|    time_elapsed       | 20       |\n",
      "|    total_timesteps    | 1500     |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -41.3    |\n",
      "|    explained_variance | -0.25    |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 299      |\n",
      "|    policy_loss        | -228     |\n",
      "|    reward             | 4.976525 |\n",
      "|    std                | 1.01     |\n",
      "|    value_loss         | 33.5     |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 76        |\n",
      "|    iterations         | 400       |\n",
      "|    time_elapsed       | 26        |\n",
      "|    total_timesteps    | 2000      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 399       |\n",
      "|    policy_loss        | 45.8      |\n",
      "|    reward             | 3.6554909 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 2.65      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 77        |\n",
      "|    iterations         | 500       |\n",
      "|    time_elapsed       | 32        |\n",
      "|    total_timesteps    | 2500      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.4     |\n",
      "|    explained_variance | 2.38e-07  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 499       |\n",
      "|    policy_loss        | 304       |\n",
      "|    reward             | -7.385315 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 75.1      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 78        |\n",
      "|    iterations         | 600       |\n",
      "|    time_elapsed       | 38        |\n",
      "|    total_timesteps    | 3000      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.4     |\n",
      "|    explained_variance | -0.058    |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 599       |\n",
      "|    policy_loss        | 158       |\n",
      "|    reward             | 0.5738542 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 14.5      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 79         |\n",
      "|    iterations         | 700        |\n",
      "|    time_elapsed       | 44         |\n",
      "|    total_timesteps    | 3500       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.4      |\n",
      "|    explained_variance | -0.281     |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 699        |\n",
      "|    policy_loss        | -103       |\n",
      "|    reward             | -3.3156214 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 6.27       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 80         |\n",
      "|    iterations         | 800        |\n",
      "|    time_elapsed       | 49         |\n",
      "|    total_timesteps    | 4000       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.4      |\n",
      "|    explained_variance | -0.249     |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 799        |\n",
      "|    policy_loss        | -36.4      |\n",
      "|    reward             | -1.1006471 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 1.5        |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 80         |\n",
      "|    iterations         | 900        |\n",
      "|    time_elapsed       | 55         |\n",
      "|    total_timesteps    | 4500       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 899        |\n",
      "|    policy_loss        | 118        |\n",
      "|    reward             | 0.53598815 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 12         |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 80        |\n",
      "|    iterations         | 1000      |\n",
      "|    time_elapsed       | 61        |\n",
      "|    total_timesteps    | 5000      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 999       |\n",
      "|    policy_loss        | 31.4      |\n",
      "|    reward             | -8.689872 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 15.3      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 1100      |\n",
      "|    time_elapsed       | 67        |\n",
      "|    total_timesteps    | 5500      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 1099      |\n",
      "|    policy_loss        | -457      |\n",
      "|    reward             | 6.8270416 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 126       |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 1200        |\n",
      "|    time_elapsed       | 73          |\n",
      "|    total_timesteps    | 6000        |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.3       |\n",
      "|    explained_variance | -1.19e-07   |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 1199        |\n",
      "|    policy_loss        | -144        |\n",
      "|    reward             | -0.46467865 |\n",
      "|    std                | 1.01        |\n",
      "|    value_loss         | 13.4        |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 1300       |\n",
      "|    time_elapsed       | 80         |\n",
      "|    total_timesteps    | 6500       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 1299       |\n",
      "|    policy_loss        | -34.9      |\n",
      "|    reward             | -4.7488437 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 4.8        |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 1400       |\n",
      "|    time_elapsed       | 86         |\n",
      "|    total_timesteps    | 7000       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 1399       |\n",
      "|    policy_loss        | 260        |\n",
      "|    reward             | -2.5306706 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 45.5       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 1500      |\n",
      "|    time_elapsed       | 92        |\n",
      "|    total_timesteps    | 7500      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 1499      |\n",
      "|    policy_loss        | 108       |\n",
      "|    reward             | 5.976305  |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 14.2      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 1600       |\n",
      "|    time_elapsed       | 97         |\n",
      "|    total_timesteps    | 8000       |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 1599       |\n",
      "|    policy_loss        | -25.4      |\n",
      "|    reward             | -1.3046212 |\n",
      "|    std                | 1          |\n",
      "|    value_loss         | 1.58       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 82        |\n",
      "|    iterations         | 1700      |\n",
      "|    time_elapsed       | 102       |\n",
      "|    total_timesteps    | 8500      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | 1.19e-07  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 1699      |\n",
      "|    policy_loss        | -80.9     |\n",
      "|    reward             | 0.8930253 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 73.1      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 82        |\n",
      "|    iterations         | 1800      |\n",
      "|    time_elapsed       | 108       |\n",
      "|    total_timesteps    | 9000      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.2     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 1799      |\n",
      "|    policy_loss        | 0.984     |\n",
      "|    reward             | 3.0571103 |\n",
      "|    std                | 1         |\n",
      "|    value_loss         | 0.775     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 82        |\n",
      "|    iterations         | 1900      |\n",
      "|    time_elapsed       | 115       |\n",
      "|    total_timesteps    | 9500      |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.2     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 1899      |\n",
      "|    policy_loss        | -53.9     |\n",
      "|    reward             | 0.9994806 |\n",
      "|    std                | 1         |\n",
      "|    value_loss         | 4.7       |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 82          |\n",
      "|    iterations         | 2000        |\n",
      "|    time_elapsed       | 121         |\n",
      "|    total_timesteps    | 10000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.2       |\n",
      "|    explained_variance | -1.19e-07   |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 1999        |\n",
      "|    policy_loss        | -132        |\n",
      "|    reward             | -0.08022534 |\n",
      "|    std                | 1           |\n",
      "|    value_loss         | 11.7        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 2100      |\n",
      "|    time_elapsed       | 128       |\n",
      "|    total_timesteps    | 10500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.2     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 2099      |\n",
      "|    policy_loss        | -19.4     |\n",
      "|    reward             | 3.730105  |\n",
      "|    std                | 1         |\n",
      "|    value_loss         | 2.32      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 2200       |\n",
      "|    time_elapsed       | 134        |\n",
      "|    total_timesteps    | 11000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.2      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 2199       |\n",
      "|    policy_loss        | -211       |\n",
      "|    reward             | -11.373576 |\n",
      "|    std                | 1          |\n",
      "|    value_loss         | 48         |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 2300      |\n",
      "|    time_elapsed       | 140       |\n",
      "|    total_timesteps    | 11500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.2     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 2299      |\n",
      "|    policy_loss        | -3.44e+03 |\n",
      "|    reward             | 11.289641 |\n",
      "|    std                | 1         |\n",
      "|    value_loss         | 7.66e+03  |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 2400       |\n",
      "|    time_elapsed       | 147        |\n",
      "|    total_timesteps    | 12000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.2      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 2399       |\n",
      "|    policy_loss        | 57.2       |\n",
      "|    reward             | 0.95172143 |\n",
      "|    std                | 1          |\n",
      "|    value_loss         | 5.88       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 2500       |\n",
      "|    time_elapsed       | 153        |\n",
      "|    total_timesteps    | 12500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | -1.19e-07  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 2499       |\n",
      "|    policy_loss        | -52.9      |\n",
      "|    reward             | 0.43206617 |\n",
      "|    std                | 1          |\n",
      "|    value_loss         | 2.25       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 2600      |\n",
      "|    time_elapsed       | 159       |\n",
      "|    total_timesteps    | 13000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 2599      |\n",
      "|    policy_loss        | 3.12      |\n",
      "|    reward             | 1.6909766 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 0.264     |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 2700       |\n",
      "|    time_elapsed       | 165        |\n",
      "|    total_timesteps    | 13500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | -1.19e-07  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 2699       |\n",
      "|    policy_loss        | -91.8      |\n",
      "|    reward             | 0.28665826 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 6.41       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 2800      |\n",
      "|    time_elapsed       | 171       |\n",
      "|    total_timesteps    | 14000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | 1.19e-07  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 2799      |\n",
      "|    policy_loss        | 408       |\n",
      "|    reward             | 0.5753403 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 124       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 2900      |\n",
      "|    time_elapsed       | 177       |\n",
      "|    total_timesteps    | 14500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 2899      |\n",
      "|    policy_loss        | -75.4     |\n",
      "|    reward             | 1.1065758 |\n",
      "|    std                | 1         |\n",
      "|    value_loss         | 3.76      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 3000       |\n",
      "|    time_elapsed       | 183        |\n",
      "|    total_timesteps    | 15000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 2999       |\n",
      "|    policy_loss        | 93.6       |\n",
      "|    reward             | 0.77030516 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 5.9        |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 3100       |\n",
      "|    time_elapsed       | 189        |\n",
      "|    total_timesteps    | 15500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 3099       |\n",
      "|    policy_loss        | 68.3       |\n",
      "|    reward             | -0.6511301 |\n",
      "|    std                | 1          |\n",
      "|    value_loss         | 3.94       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 3200       |\n",
      "|    time_elapsed       | 195        |\n",
      "|    total_timesteps    | 16000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 3199       |\n",
      "|    policy_loss        | -21.5      |\n",
      "|    reward             | -1.6668897 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 7.1        |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 3300        |\n",
      "|    time_elapsed       | 202         |\n",
      "|    total_timesteps    | 16500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.4       |\n",
      "|    explained_variance | -0.00572    |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 3299        |\n",
      "|    policy_loss        | 41.7        |\n",
      "|    reward             | -0.10868696 |\n",
      "|    std                | 1.01        |\n",
      "|    value_loss         | 1.44        |\n",
      "---------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 3400     |\n",
      "|    time_elapsed       | 208      |\n",
      "|    total_timesteps    | 17000    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -41.4    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 3399     |\n",
      "|    policy_loss        | 93.5     |\n",
      "|    reward             | 2.739077 |\n",
      "|    std                | 1.01     |\n",
      "|    value_loss         | 7.22     |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 3500      |\n",
      "|    time_elapsed       | 214       |\n",
      "|    total_timesteps    | 17500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.4     |\n",
      "|    explained_variance | -0.00648  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 3499      |\n",
      "|    policy_loss        | 89.6      |\n",
      "|    reward             | -0.526905 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 6.39      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 3600      |\n",
      "|    time_elapsed       | 220       |\n",
      "|    total_timesteps    | 18000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.4     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 3599      |\n",
      "|    policy_loss        | -50.6     |\n",
      "|    reward             | 2.4003286 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 3.12      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 3700      |\n",
      "|    time_elapsed       | 226       |\n",
      "|    total_timesteps    | 18500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 3699      |\n",
      "|    policy_loss        | 137       |\n",
      "|    reward             | 1.4176131 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 14.5      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 3800       |\n",
      "|    time_elapsed       | 232        |\n",
      "|    total_timesteps    | 19000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.4      |\n",
      "|    explained_variance | -0.00132   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 3799       |\n",
      "|    policy_loss        | 117        |\n",
      "|    reward             | 0.65599006 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 10.5       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 3900       |\n",
      "|    time_elapsed       | 238        |\n",
      "|    total_timesteps    | 19500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.5      |\n",
      "|    explained_variance | 1.19e-07   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 3899       |\n",
      "|    policy_loss        | 72.6       |\n",
      "|    reward             | 0.21731114 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 6.96       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 4000      |\n",
      "|    time_elapsed       | 244       |\n",
      "|    total_timesteps    | 20000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.5     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 3999      |\n",
      "|    policy_loss        | -39       |\n",
      "|    reward             | 2.0023088 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 2.69      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 4100      |\n",
      "|    time_elapsed       | 250       |\n",
      "|    total_timesteps    | 20500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.5     |\n",
      "|    explained_variance | 0.0154    |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 4099      |\n",
      "|    policy_loss        | 14.2      |\n",
      "|    reward             | 0.3399915 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 1.77      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 4200       |\n",
      "|    time_elapsed       | 257        |\n",
      "|    total_timesteps    | 21000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.5      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 4199       |\n",
      "|    policy_loss        | -101       |\n",
      "|    reward             | 0.46735242 |\n",
      "|    std                | 1.01       |\n",
      "|    value_loss         | 7.59       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 4300      |\n",
      "|    time_elapsed       | 263       |\n",
      "|    total_timesteps    | 21500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.5     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 4299      |\n",
      "|    policy_loss        | -90.1     |\n",
      "|    reward             | 2.4704082 |\n",
      "|    std                | 1.01      |\n",
      "|    value_loss         | 4.76      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 4400      |\n",
      "|    time_elapsed       | 269       |\n",
      "|    total_timesteps    | 22000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.6     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 4399      |\n",
      "|    policy_loss        | 117       |\n",
      "|    reward             | 2.6939738 |\n",
      "|    std                | 1.02      |\n",
      "|    value_loss         | 10.9      |\n",
      "-------------------------------------\n",
      "----------------------------------------\n",
      "| time/                 |              |\n",
      "|    fps                | 81           |\n",
      "|    iterations         | 4500         |\n",
      "|    time_elapsed       | 275          |\n",
      "|    total_timesteps    | 22500        |\n",
      "| train/                |              |\n",
      "|    entropy_loss       | -41.6        |\n",
      "|    explained_variance | 5.96e-08     |\n",
      "|    learning_rate      | 0.0007       |\n",
      "|    n_updates          | 4499         |\n",
      "|    policy_loss        | 47.5         |\n",
      "|    reward             | -0.028094267 |\n",
      "|    std                | 1.02         |\n",
      "|    value_loss         | 1.74         |\n",
      "----------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 4600       |\n",
      "|    time_elapsed       | 281        |\n",
      "|    total_timesteps    | 23000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.6      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 4599       |\n",
      "|    policy_loss        | 95         |\n",
      "|    reward             | 0.83504915 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 7.6        |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 4700        |\n",
      "|    time_elapsed       | 288         |\n",
      "|    total_timesteps    | 23500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.6       |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 4699        |\n",
      "|    policy_loss        | -49.8       |\n",
      "|    reward             | -0.30735523 |\n",
      "|    std                | 1.02        |\n",
      "|    value_loss         | 3.42        |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 4800       |\n",
      "|    time_elapsed       | 294        |\n",
      "|    total_timesteps    | 24000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.6      |\n",
      "|    explained_variance | 1.19e-07   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 4799       |\n",
      "|    policy_loss        | -44.3      |\n",
      "|    reward             | 0.02049082 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 2.23       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 4900      |\n",
      "|    time_elapsed       | 300       |\n",
      "|    total_timesteps    | 24500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.7     |\n",
      "|    explained_variance | 5.96e-08  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 4899      |\n",
      "|    policy_loss        | -117      |\n",
      "|    reward             | 1.1208315 |\n",
      "|    std                | 1.02      |\n",
      "|    value_loss         | 11.6      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 5000       |\n",
      "|    time_elapsed       | 306        |\n",
      "|    total_timesteps    | 25000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.7      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 4999       |\n",
      "|    policy_loss        | -157       |\n",
      "|    reward             | -1.5758094 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 16.1       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 5100      |\n",
      "|    time_elapsed       | 313       |\n",
      "|    total_timesteps    | 25500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.7     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 5099      |\n",
      "|    policy_loss        | 83.7      |\n",
      "|    reward             | 1.8566744 |\n",
      "|    std                | 1.02      |\n",
      "|    value_loss         | 8.17      |\n",
      "-------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 5200     |\n",
      "|    time_elapsed       | 319      |\n",
      "|    total_timesteps    | 26000    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -41.7    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 5199     |\n",
      "|    policy_loss        | -278     |\n",
      "|    reward             | 6.261811 |\n",
      "|    std                | 1.02     |\n",
      "|    value_loss         | 72       |\n",
      "------------------------------------\n",
      "day: 2892, episode: 10\n",
      "begin_total_asset: 1000000.00\n",
      "end_total_asset: 4788503.65\n",
      "total_reward: 3788503.65\n",
      "total_cost: 13381.94\n",
      "total_trades: 52841\n",
      "Sharpe: 0.800\n",
      "=================================\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 5300        |\n",
      "|    time_elapsed       | 325         |\n",
      "|    total_timesteps    | 26500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.7       |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 5299        |\n",
      "|    policy_loss        | -2.65       |\n",
      "|    reward             | 0.039904572 |\n",
      "|    std                | 1.02        |\n",
      "|    value_loss         | 0.09        |\n",
      "---------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 5400        |\n",
      "|    time_elapsed       | 331         |\n",
      "|    total_timesteps    | 27000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.7       |\n",
      "|    explained_variance | 0.0312      |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 5399        |\n",
      "|    policy_loss        | -118        |\n",
      "|    reward             | -0.45222193 |\n",
      "|    std                | 1.02        |\n",
      "|    value_loss         | 12.6        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 5500      |\n",
      "|    time_elapsed       | 338       |\n",
      "|    total_timesteps    | 27500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.7     |\n",
      "|    explained_variance | 0.0454    |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 5499      |\n",
      "|    policy_loss        | 25.7      |\n",
      "|    reward             | 3.8982944 |\n",
      "|    std                | 1.02      |\n",
      "|    value_loss         | 15.2      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 5600       |\n",
      "|    time_elapsed       | 344        |\n",
      "|    total_timesteps    | 28000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.7      |\n",
      "|    explained_variance | -0.0207    |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 5599       |\n",
      "|    policy_loss        | -223       |\n",
      "|    reward             | -1.3120009 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 33.4       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 5700      |\n",
      "|    time_elapsed       | 350       |\n",
      "|    total_timesteps    | 28500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.7     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 5699      |\n",
      "|    policy_loss        | -405      |\n",
      "|    reward             | -4.460994 |\n",
      "|    std                | 1.02      |\n",
      "|    value_loss         | 118       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 5800      |\n",
      "|    time_elapsed       | 356       |\n",
      "|    total_timesteps    | 29000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.7     |\n",
      "|    explained_variance | -3.64     |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 5799      |\n",
      "|    policy_loss        | 37.1      |\n",
      "|    reward             | 2.3169107 |\n",
      "|    std                | 1.02      |\n",
      "|    value_loss         | 17.9      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 5900       |\n",
      "|    time_elapsed       | 362        |\n",
      "|    total_timesteps    | 29500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.7      |\n",
      "|    explained_variance | -0.00338   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 5899       |\n",
      "|    policy_loss        | 55.1       |\n",
      "|    reward             | -1.1213443 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 3.45       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 6000       |\n",
      "|    time_elapsed       | 368        |\n",
      "|    total_timesteps    | 30000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.7      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 5999       |\n",
      "|    policy_loss        | 75.8       |\n",
      "|    reward             | -0.3227125 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 4.66       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 6100       |\n",
      "|    time_elapsed       | 374        |\n",
      "|    total_timesteps    | 30500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -41.8      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 6099       |\n",
      "|    policy_loss        | -75.9      |\n",
      "|    reward             | -5.2930465 |\n",
      "|    std                | 1.02       |\n",
      "|    value_loss         | 9.55       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 6200        |\n",
      "|    time_elapsed       | 380         |\n",
      "|    total_timesteps    | 31000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -41.9       |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 6199        |\n",
      "|    policy_loss        | -50.1       |\n",
      "|    reward             | -0.35100487 |\n",
      "|    std                | 1.03        |\n",
      "|    value_loss         | 3.57        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 6300      |\n",
      "|    time_elapsed       | 386       |\n",
      "|    total_timesteps    | 31500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.9     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 6299      |\n",
      "|    policy_loss        | 563       |\n",
      "|    reward             | 5.1647105 |\n",
      "|    std                | 1.03      |\n",
      "|    value_loss         | 246       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 6400      |\n",
      "|    time_elapsed       | 392       |\n",
      "|    total_timesteps    | 32000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -41.9     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 6399      |\n",
      "|    policy_loss        | 84.3      |\n",
      "|    reward             | 2.1525736 |\n",
      "|    std                | 1.03      |\n",
      "|    value_loss         | 4.84      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 6500       |\n",
      "|    time_elapsed       | 398        |\n",
      "|    total_timesteps    | 32500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42        |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 6499       |\n",
      "|    policy_loss        | 64.7       |\n",
      "|    reward             | -4.6953793 |\n",
      "|    std                | 1.03       |\n",
      "|    value_loss         | 6.59       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 6600       |\n",
      "|    time_elapsed       | 404        |\n",
      "|    total_timesteps    | 33000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.1      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 6599       |\n",
      "|    policy_loss        | -112       |\n",
      "|    reward             | -1.7710909 |\n",
      "|    std                | 1.03       |\n",
      "|    value_loss         | 14         |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 6700      |\n",
      "|    time_elapsed       | 410       |\n",
      "|    total_timesteps    | 33500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.1     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 6699      |\n",
      "|    policy_loss        | -930      |\n",
      "|    reward             | -6.737852 |\n",
      "|    std                | 1.03      |\n",
      "|    value_loss         | 659       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 6800      |\n",
      "|    time_elapsed       | 416       |\n",
      "|    total_timesteps    | 34000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.1     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 6799      |\n",
      "|    policy_loss        | -172      |\n",
      "|    reward             | 1.2709178 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 19.8      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 6900       |\n",
      "|    time_elapsed       | 423        |\n",
      "|    total_timesteps    | 34500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.2      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 6899       |\n",
      "|    policy_loss        | -505       |\n",
      "|    reward             | -0.6991685 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 159        |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 7000       |\n",
      "|    time_elapsed       | 429        |\n",
      "|    total_timesteps    | 35000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.2      |\n",
      "|    explained_variance | -1.19e-07  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 6999       |\n",
      "|    policy_loss        | 67.8       |\n",
      "|    reward             | 0.16242695 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 3.3        |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 7100       |\n",
      "|    time_elapsed       | 435        |\n",
      "|    total_timesteps    | 35500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 7099       |\n",
      "|    policy_loss        | 70         |\n",
      "|    reward             | 0.47564518 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 4.41       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 7200      |\n",
      "|    time_elapsed       | 441       |\n",
      "|    total_timesteps    | 36000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 7199      |\n",
      "|    policy_loss        | -471      |\n",
      "|    reward             | 1.6633062 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 131       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 7300      |\n",
      "|    time_elapsed       | 447       |\n",
      "|    total_timesteps    | 36500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.2     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 7299      |\n",
      "|    policy_loss        | -386      |\n",
      "|    reward             | 2.2748234 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 117       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 7400      |\n",
      "|    time_elapsed       | 453       |\n",
      "|    total_timesteps    | 37000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.2     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 7399      |\n",
      "|    policy_loss        | -64.9     |\n",
      "|    reward             | -9.05803  |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 14.3      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 7500       |\n",
      "|    time_elapsed       | 459        |\n",
      "|    total_timesteps    | 37500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.2      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 7499       |\n",
      "|    policy_loss        | 78.4       |\n",
      "|    reward             | -14.008429 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 8.87       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 7600      |\n",
      "|    time_elapsed       | 465       |\n",
      "|    total_timesteps    | 38000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.2     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 7599      |\n",
      "|    policy_loss        | -232      |\n",
      "|    reward             | 3.2704794 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 30.9      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 7700      |\n",
      "|    time_elapsed       | 472       |\n",
      "|    total_timesteps    | 38500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 7699      |\n",
      "|    policy_loss        | -123      |\n",
      "|    reward             | 0.6705696 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 9.44      |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 7800        |\n",
      "|    time_elapsed       | 478         |\n",
      "|    total_timesteps    | 39000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.2       |\n",
      "|    explained_variance | -0.286      |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 7799        |\n",
      "|    policy_loss        | -192        |\n",
      "|    reward             | -0.29553548 |\n",
      "|    std                | 1.04        |\n",
      "|    value_loss         | 22.6        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 7900      |\n",
      "|    time_elapsed       | 484       |\n",
      "|    total_timesteps    | 39500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.2     |\n",
      "|    explained_variance | 4.16e-05  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 7899      |\n",
      "|    policy_loss        | 279       |\n",
      "|    reward             | 6.1835284 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 108       |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 8000       |\n",
      "|    time_elapsed       | 490        |\n",
      "|    total_timesteps    | 40000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 7999       |\n",
      "|    policy_loss        | -298       |\n",
      "|    reward             | -2.2971087 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 67.9       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 8100      |\n",
      "|    time_elapsed       | 496       |\n",
      "|    total_timesteps    | 40500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 8099      |\n",
      "|    policy_loss        | 91.5      |\n",
      "|    reward             | 10.069674 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 15        |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 8200       |\n",
      "|    time_elapsed       | 502        |\n",
      "|    total_timesteps    | 41000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.4      |\n",
      "|    explained_variance | 5.96e-08   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 8199       |\n",
      "|    policy_loss        | -147       |\n",
      "|    reward             | 0.49413943 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 12.4       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 8300      |\n",
      "|    time_elapsed       | 508       |\n",
      "|    total_timesteps    | 41500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 8299      |\n",
      "|    policy_loss        | -196      |\n",
      "|    reward             | -2.706406 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 23.1      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 8400       |\n",
      "|    time_elapsed       | 514        |\n",
      "|    total_timesteps    | 42000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 8399       |\n",
      "|    policy_loss        | -144       |\n",
      "|    reward             | -0.5671885 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 13.3       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 8500        |\n",
      "|    time_elapsed       | 520         |\n",
      "|    total_timesteps    | 42500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.3       |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 8499        |\n",
      "|    policy_loss        | -198        |\n",
      "|    reward             | -0.09987533 |\n",
      "|    std                | 1.04        |\n",
      "|    value_loss         | 25          |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 8600       |\n",
      "|    time_elapsed       | 526        |\n",
      "|    total_timesteps    | 43000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.4      |\n",
      "|    explained_variance | -1.19e-07  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 8599       |\n",
      "|    policy_loss        | -83.1      |\n",
      "|    reward             | -14.499234 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 37.6       |\n",
      "--------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 8700     |\n",
      "|    time_elapsed       | 532      |\n",
      "|    total_timesteps    | 43500    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -42.3    |\n",
      "|    explained_variance | 0.13     |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 8699     |\n",
      "|    policy_loss        | 3.36     |\n",
      "|    reward             | 1.415593 |\n",
      "|    std                | 1.04     |\n",
      "|    value_loss         | 1.14     |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 8800      |\n",
      "|    time_elapsed       | 538       |\n",
      "|    total_timesteps    | 44000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 8799      |\n",
      "|    policy_loss        | -58.8     |\n",
      "|    reward             | 0.8581717 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 2.39      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 8900      |\n",
      "|    time_elapsed       | 544       |\n",
      "|    total_timesteps    | 44500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 8899      |\n",
      "|    policy_loss        | 156       |\n",
      "|    reward             | 1.1081398 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 19.1      |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 9000        |\n",
      "|    time_elapsed       | 550         |\n",
      "|    total_timesteps    | 45000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.4       |\n",
      "|    explained_variance | 1.19e-07    |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 8999        |\n",
      "|    policy_loss        | 90.7        |\n",
      "|    reward             | -0.76513153 |\n",
      "|    std                | 1.05        |\n",
      "|    value_loss         | 18.3        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 9100      |\n",
      "|    time_elapsed       | 556       |\n",
      "|    total_timesteps    | 45500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 9099      |\n",
      "|    policy_loss        | 8.39      |\n",
      "|    reward             | 0.2816334 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 0.849     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 9200      |\n",
      "|    time_elapsed       | 562       |\n",
      "|    total_timesteps    | 46000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.5     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 9199      |\n",
      "|    policy_loss        | 151       |\n",
      "|    reward             | 4.3771834 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 18        |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 9300      |\n",
      "|    time_elapsed       | 568       |\n",
      "|    total_timesteps    | 46500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | -2.38e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 9299      |\n",
      "|    policy_loss        | -75.4     |\n",
      "|    reward             | 0.6335295 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 4.27      |\n",
      "-------------------------------------\n",
      "----------------------------------------\n",
      "| time/                 |              |\n",
      "|    fps                | 81           |\n",
      "|    iterations         | 9400         |\n",
      "|    time_elapsed       | 574          |\n",
      "|    total_timesteps    | 47000        |\n",
      "| train/                |              |\n",
      "|    entropy_loss       | -42.4        |\n",
      "|    explained_variance | 0.114        |\n",
      "|    learning_rate      | 0.0007       |\n",
      "|    n_updates          | 9399         |\n",
      "|    policy_loss        | 76.9         |\n",
      "|    reward             | -0.014310087 |\n",
      "|    std                | 1.05         |\n",
      "|    value_loss         | 4.98         |\n",
      "----------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 9500      |\n",
      "|    time_elapsed       | 580       |\n",
      "|    total_timesteps    | 47500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 9499      |\n",
      "|    policy_loss        | 265       |\n",
      "|    reward             | 0.2187624 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 40.2      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 9600       |\n",
      "|    time_elapsed       | 587        |\n",
      "|    total_timesteps    | 48000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.4      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 9599       |\n",
      "|    policy_loss        | -66.6      |\n",
      "|    reward             | -2.4444582 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 3.54       |\n",
      "--------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 9700     |\n",
      "|    time_elapsed       | 593      |\n",
      "|    total_timesteps    | 48500    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -42.4    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 9699     |\n",
      "|    policy_loss        | 112      |\n",
      "|    reward             | 1.870868 |\n",
      "|    std                | 1.05     |\n",
      "|    value_loss         | 9.81     |\n",
      "------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 9800       |\n",
      "|    time_elapsed       | 599        |\n",
      "|    total_timesteps    | 49000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.3      |\n",
      "|    explained_variance | 1.19e-07   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 9799       |\n",
      "|    policy_loss        | 200        |\n",
      "|    reward             | 0.86631155 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 59.8       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 9900       |\n",
      "|    time_elapsed       | 606        |\n",
      "|    total_timesteps    | 49500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 9899       |\n",
      "|    policy_loss        | 91.1       |\n",
      "|    reward             | -0.2896801 |\n",
      "|    std                | 1.04       |\n",
      "|    value_loss         | 4.97       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 10000     |\n",
      "|    time_elapsed       | 612       |\n",
      "|    total_timesteps    | 50000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 9999      |\n",
      "|    policy_loss        | 8.66      |\n",
      "|    reward             | 0.6127399 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 0.518     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 10100     |\n",
      "|    time_elapsed       | 618       |\n",
      "|    total_timesteps    | 50500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 10099     |\n",
      "|    policy_loss        | 138       |\n",
      "|    reward             | 0.8077261 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 11.3      |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 10200       |\n",
      "|    time_elapsed       | 624         |\n",
      "|    total_timesteps    | 51000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.3       |\n",
      "|    explained_variance | -1.19e-07   |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 10199       |\n",
      "|    policy_loss        | 105         |\n",
      "|    reward             | -0.10447133 |\n",
      "|    std                | 1.04        |\n",
      "|    value_loss         | 8.16        |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 10300      |\n",
      "|    time_elapsed       | 630        |\n",
      "|    total_timesteps    | 51500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.4      |\n",
      "|    explained_variance | 5.96e-08   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 10299      |\n",
      "|    policy_loss        | -57.4      |\n",
      "|    reward             | -0.7167459 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 4.57       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 10400     |\n",
      "|    time_elapsed       | 636       |\n",
      "|    total_timesteps    | 52000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.3     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 10399     |\n",
      "|    policy_loss        | -338      |\n",
      "|    reward             | 18.067154 |\n",
      "|    std                | 1.04      |\n",
      "|    value_loss         | 185       |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 10500     |\n",
      "|    time_elapsed       | 642       |\n",
      "|    total_timesteps    | 52500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 10499     |\n",
      "|    policy_loss        | 116       |\n",
      "|    reward             | 1.5356508 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 9.51      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 10600     |\n",
      "|    time_elapsed       | 648       |\n",
      "|    total_timesteps    | 53000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 10599     |\n",
      "|    policy_loss        | -65.8     |\n",
      "|    reward             | 0.5770439 |\n",
      "|    std                | 1.05      |\n",
      "|    value_loss         | 2.97      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 10700      |\n",
      "|    time_elapsed       | 654        |\n",
      "|    total_timesteps    | 53500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.5      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 10699      |\n",
      "|    policy_loss        | 83.7       |\n",
      "|    reward             | 0.58848614 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 4.54       |\n",
      "--------------------------------------\n",
      "----------------------------------------\n",
      "| time/                 |              |\n",
      "|    fps                | 81           |\n",
      "|    iterations         | 10800        |\n",
      "|    time_elapsed       | 660          |\n",
      "|    total_timesteps    | 54000        |\n",
      "| train/                |              |\n",
      "|    entropy_loss       | -42.5        |\n",
      "|    explained_variance | 0            |\n",
      "|    learning_rate      | 0.0007       |\n",
      "|    n_updates          | 10799        |\n",
      "|    policy_loss        | -80.4        |\n",
      "|    reward             | -0.030330671 |\n",
      "|    std                | 1.05         |\n",
      "|    value_loss         | 4.57         |\n",
      "----------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 10900      |\n",
      "|    time_elapsed       | 666        |\n",
      "|    total_timesteps    | 54500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.5      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 10899      |\n",
      "|    policy_loss        | -41.4      |\n",
      "|    reward             | -1.3493041 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 3.74       |\n",
      "--------------------------------------\n",
      "day: 2892, episode: 20\n",
      "begin_total_asset: 1000000.00\n",
      "end_total_asset: 2602112.56\n",
      "total_reward: 1602112.56\n",
      "total_cost: 4389.31\n",
      "total_trades: 47842\n",
      "Sharpe: 0.563\n",
      "=================================\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 11000      |\n",
      "|    time_elapsed       | 673        |\n",
      "|    total_timesteps    | 55000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.6      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 10999      |\n",
      "|    policy_loss        | -15.7      |\n",
      "|    reward             | -0.7003081 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 1.02       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 11100      |\n",
      "|    time_elapsed       | 679        |\n",
      "|    total_timesteps    | 55500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.6      |\n",
      "|    explained_variance | -1.19e-07  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 11099      |\n",
      "|    policy_loss        | 26.4       |\n",
      "|    reward             | -0.3829968 |\n",
      "|    std                | 1.05       |\n",
      "|    value_loss         | 0.83       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 11200       |\n",
      "|    time_elapsed       | 685         |\n",
      "|    total_timesteps    | 56000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.6       |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 11199       |\n",
      "|    policy_loss        | -26.8       |\n",
      "|    reward             | -0.70639944 |\n",
      "|    std                | 1.05        |\n",
      "|    value_loss         | 0.618       |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 11300     |\n",
      "|    time_elapsed       | 691       |\n",
      "|    total_timesteps    | 56500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.7     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 11299     |\n",
      "|    policy_loss        | -23.4     |\n",
      "|    reward             | 1.5610566 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 4.4       |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 11400      |\n",
      "|    time_elapsed       | 697        |\n",
      "|    total_timesteps    | 57000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.7      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 11399      |\n",
      "|    policy_loss        | 83.8       |\n",
      "|    reward             | -1.5901296 |\n",
      "|    std                | 1.06       |\n",
      "|    value_loss         | 5.11       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 11500       |\n",
      "|    time_elapsed       | 704         |\n",
      "|    total_timesteps    | 57500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.7       |\n",
      "|    explained_variance | 5.96e-08    |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 11499       |\n",
      "|    policy_loss        | 177         |\n",
      "|    reward             | 0.034060318 |\n",
      "|    std                | 1.06        |\n",
      "|    value_loss         | 19.2        |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 11600      |\n",
      "|    time_elapsed       | 710        |\n",
      "|    total_timesteps    | 58000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.7      |\n",
      "|    explained_variance | 0.213      |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 11599      |\n",
      "|    policy_loss        | 199        |\n",
      "|    reward             | 0.53948253 |\n",
      "|    std                | 1.06       |\n",
      "|    value_loss         | 30.5       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 11700     |\n",
      "|    time_elapsed       | 716       |\n",
      "|    total_timesteps    | 58500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.7     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 11699     |\n",
      "|    policy_loss        | -90.7     |\n",
      "|    reward             | 3.6651578 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 4.55      |\n",
      "-------------------------------------\n",
      "----------------------------------------\n",
      "| time/                 |              |\n",
      "|    fps                | 81           |\n",
      "|    iterations         | 11800        |\n",
      "|    time_elapsed       | 722          |\n",
      "|    total_timesteps    | 59000        |\n",
      "| train/                |              |\n",
      "|    entropy_loss       | -42.8        |\n",
      "|    explained_variance | 0            |\n",
      "|    learning_rate      | 0.0007       |\n",
      "|    n_updates          | 11799        |\n",
      "|    policy_loss        | 44.7         |\n",
      "|    reward             | -0.053666696 |\n",
      "|    std                | 1.06         |\n",
      "|    value_loss         | 3.81         |\n",
      "----------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 11900     |\n",
      "|    time_elapsed       | 728       |\n",
      "|    total_timesteps    | 59500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.7     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 11899     |\n",
      "|    policy_loss        | 5.39      |\n",
      "|    reward             | 3.3913639 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 4.03      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 12000      |\n",
      "|    time_elapsed       | 734        |\n",
      "|    total_timesteps    | 60000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.7      |\n",
      "|    explained_variance | 5.96e-08   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 11999      |\n",
      "|    policy_loss        | -44.9      |\n",
      "|    reward             | -0.2507043 |\n",
      "|    std                | 1.06       |\n",
      "|    value_loss         | 1.8        |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 12100     |\n",
      "|    time_elapsed       | 741       |\n",
      "|    total_timesteps    | 60500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.7     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 12099     |\n",
      "|    policy_loss        | 178       |\n",
      "|    reward             | 1.3605131 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 24.4      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 12200      |\n",
      "|    time_elapsed       | 747        |\n",
      "|    total_timesteps    | 61000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.7      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 12199      |\n",
      "|    policy_loss        | -63.8      |\n",
      "|    reward             | 0.44324654 |\n",
      "|    std                | 1.06       |\n",
      "|    value_loss         | 2.67       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 12300       |\n",
      "|    time_elapsed       | 753         |\n",
      "|    total_timesteps    | 61500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.7       |\n",
      "|    explained_variance | -1.19e-07   |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 12299       |\n",
      "|    policy_loss        | -38.5       |\n",
      "|    reward             | -0.53176147 |\n",
      "|    std                | 1.06        |\n",
      "|    value_loss         | 1.85        |\n",
      "---------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 12400    |\n",
      "|    time_elapsed       | 759      |\n",
      "|    total_timesteps    | 62000    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -42.7    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 12399    |\n",
      "|    policy_loss        | 15.8     |\n",
      "|    reward             | 2.998329 |\n",
      "|    std                | 1.06     |\n",
      "|    value_loss         | 1.09     |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 12500     |\n",
      "|    time_elapsed       | 765       |\n",
      "|    total_timesteps    | 62500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.7     |\n",
      "|    explained_variance | -1.19e-07 |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 12499     |\n",
      "|    policy_loss        | -126      |\n",
      "|    reward             | 0.8309709 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 11.7      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 12600      |\n",
      "|    time_elapsed       | 771        |\n",
      "|    total_timesteps    | 63000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.8      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 12599      |\n",
      "|    policy_loss        | 278        |\n",
      "|    reward             | -0.9452544 |\n",
      "|    std                | 1.06       |\n",
      "|    value_loss         | 45.3       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 12700     |\n",
      "|    time_elapsed       | 777       |\n",
      "|    total_timesteps    | 63500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.8     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 12699     |\n",
      "|    policy_loss        | 53.2      |\n",
      "|    reward             | -2.668338 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 3.1       |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 12800      |\n",
      "|    time_elapsed       | 783        |\n",
      "|    total_timesteps    | 64000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -42.9      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 12799      |\n",
      "|    policy_loss        | -115       |\n",
      "|    reward             | -1.7745799 |\n",
      "|    std                | 1.06       |\n",
      "|    value_loss         | 9.37       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 12900       |\n",
      "|    time_elapsed       | 789         |\n",
      "|    total_timesteps    | 64500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -42.8       |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 12899       |\n",
      "|    policy_loss        | -13.1       |\n",
      "|    reward             | -0.36129624 |\n",
      "|    std                | 1.06        |\n",
      "|    value_loss         | 0.581       |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 13000     |\n",
      "|    time_elapsed       | 795       |\n",
      "|    total_timesteps    | 65000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.8     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 12999     |\n",
      "|    policy_loss        | 33.6      |\n",
      "|    reward             | 2.9159625 |\n",
      "|    std                | 1.06      |\n",
      "|    value_loss         | 0.901     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 13100     |\n",
      "|    time_elapsed       | 802       |\n",
      "|    total_timesteps    | 65500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.9     |\n",
      "|    explained_variance | 5.96e-08  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 13099     |\n",
      "|    policy_loss        | -26.7     |\n",
      "|    reward             | 3.1172137 |\n",
      "|    std                | 1.07      |\n",
      "|    value_loss         | 6.89      |\n",
      "-------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 13200    |\n",
      "|    time_elapsed       | 808      |\n",
      "|    total_timesteps    | 66000    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -42.9    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 13199    |\n",
      "|    policy_loss        | -240     |\n",
      "|    reward             | 2.015783 |\n",
      "|    std                | 1.07     |\n",
      "|    value_loss         | 50.2     |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 13300     |\n",
      "|    time_elapsed       | 814       |\n",
      "|    total_timesteps    | 66500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -42.9     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 13299     |\n",
      "|    policy_loss        | 802       |\n",
      "|    reward             | -3.215269 |\n",
      "|    std                | 1.07      |\n",
      "|    value_loss         | 396       |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 13400       |\n",
      "|    time_elapsed       | 820         |\n",
      "|    total_timesteps    | 67000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -43         |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 13399       |\n",
      "|    policy_loss        | 15.3        |\n",
      "|    reward             | -0.44244766 |\n",
      "|    std                | 1.07        |\n",
      "|    value_loss         | 0.538       |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 13500      |\n",
      "|    time_elapsed       | 826        |\n",
      "|    total_timesteps    | 67500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43        |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 13499      |\n",
      "|    policy_loss        | 198        |\n",
      "|    reward             | 0.18020414 |\n",
      "|    std                | 1.07       |\n",
      "|    value_loss         | 21         |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 13600       |\n",
      "|    time_elapsed       | 832         |\n",
      "|    total_timesteps    | 68000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -43         |\n",
      "|    explained_variance | 0.142       |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 13599       |\n",
      "|    policy_loss        | -342        |\n",
      "|    reward             | -0.63637924 |\n",
      "|    std                | 1.07        |\n",
      "|    value_loss         | 91.5        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 13700     |\n",
      "|    time_elapsed       | 839       |\n",
      "|    total_timesteps    | 68500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43       |\n",
      "|    explained_variance | 1.19e-07  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 13699     |\n",
      "|    policy_loss        | 57.6      |\n",
      "|    reward             | 1.1589959 |\n",
      "|    std                | 1.07      |\n",
      "|    value_loss         | 2.15      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 13800      |\n",
      "|    time_elapsed       | 845        |\n",
      "|    total_timesteps    | 69000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43        |\n",
      "|    explained_variance | 5.96e-08   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 13799      |\n",
      "|    policy_loss        | -103       |\n",
      "|    reward             | -11.408231 |\n",
      "|    std                | 1.07       |\n",
      "|    value_loss         | 6.74       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 13900       |\n",
      "|    time_elapsed       | 851         |\n",
      "|    total_timesteps    | 69500       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -43         |\n",
      "|    explained_variance | 0           |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 13899       |\n",
      "|    policy_loss        | 84.2        |\n",
      "|    reward             | -0.29590327 |\n",
      "|    std                | 1.07        |\n",
      "|    value_loss         | 5.55        |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 14000      |\n",
      "|    time_elapsed       | 858        |\n",
      "|    total_timesteps    | 70000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43        |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 13999      |\n",
      "|    policy_loss        | 39.5       |\n",
      "|    reward             | 0.44214603 |\n",
      "|    std                | 1.07       |\n",
      "|    value_loss         | 1.06       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 14100      |\n",
      "|    time_elapsed       | 864        |\n",
      "|    total_timesteps    | 70500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43.1      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 14099      |\n",
      "|    policy_loss        | 140        |\n",
      "|    reward             | -2.5239215 |\n",
      "|    std                | 1.07       |\n",
      "|    value_loss         | 11         |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 14200     |\n",
      "|    time_elapsed       | 870       |\n",
      "|    total_timesteps    | 71000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.1     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 14199     |\n",
      "|    policy_loss        | 197       |\n",
      "|    reward             | 0.4827693 |\n",
      "|    std                | 1.07      |\n",
      "|    value_loss         | 33.2      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 14300     |\n",
      "|    time_elapsed       | 876       |\n",
      "|    total_timesteps    | 71500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.1     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 14299     |\n",
      "|    policy_loss        | 108       |\n",
      "|    reward             | 1.3076432 |\n",
      "|    std                | 1.07      |\n",
      "|    value_loss         | 7.36      |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 14400      |\n",
      "|    time_elapsed       | 882        |\n",
      "|    total_timesteps    | 72000      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43.1      |\n",
      "|    explained_variance | -0.00601   |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 14399      |\n",
      "|    policy_loss        | 277        |\n",
      "|    reward             | -2.1462753 |\n",
      "|    std                | 1.07       |\n",
      "|    value_loss         | 52.9       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 14500     |\n",
      "|    time_elapsed       | 888       |\n",
      "|    total_timesteps    | 72500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.1     |\n",
      "|    explained_variance | 5.96e-08  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 14499     |\n",
      "|    policy_loss        | -30.2     |\n",
      "|    reward             | 1.0584301 |\n",
      "|    std                | 1.07      |\n",
      "|    value_loss         | 3.46      |\n",
      "-------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 14600    |\n",
      "|    time_elapsed       | 895      |\n",
      "|    total_timesteps    | 73000    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -43.2    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 14599    |\n",
      "|    policy_loss        | 98.9     |\n",
      "|    reward             | 3.857238 |\n",
      "|    std                | 1.08     |\n",
      "|    value_loss         | 7.34     |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 14700     |\n",
      "|    time_elapsed       | 901       |\n",
      "|    total_timesteps    | 73500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.2     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 14699     |\n",
      "|    policy_loss        | -37.2     |\n",
      "|    reward             | 2.1558058 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 5.59      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 14800     |\n",
      "|    time_elapsed       | 907       |\n",
      "|    total_timesteps    | 74000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 14799     |\n",
      "|    policy_loss        | -919      |\n",
      "|    reward             | 1.5680385 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 462       |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 14900      |\n",
      "|    time_elapsed       | 913        |\n",
      "|    total_timesteps    | 74500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43.3      |\n",
      "|    explained_variance | 0          |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 14899      |\n",
      "|    policy_loss        | 86.3       |\n",
      "|    reward             | -2.0305374 |\n",
      "|    std                | 1.08       |\n",
      "|    value_loss         | 10.5       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 15000     |\n",
      "|    time_elapsed       | 919       |\n",
      "|    total_timesteps    | 75000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.4     |\n",
      "|    explained_variance | 1.19e-07  |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 14999     |\n",
      "|    policy_loss        | 684       |\n",
      "|    reward             | -4.649955 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 315       |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 15100      |\n",
      "|    time_elapsed       | 926        |\n",
      "|    total_timesteps    | 75500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43.4      |\n",
      "|    explained_variance | -6.18e-05  |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 15099      |\n",
      "|    policy_loss        | -77.8      |\n",
      "|    reward             | 0.33084247 |\n",
      "|    std                | 1.08       |\n",
      "|    value_loss         | 6.62       |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| time/                 |             |\n",
      "|    fps                | 81          |\n",
      "|    iterations         | 15200       |\n",
      "|    time_elapsed       | 932         |\n",
      "|    total_timesteps    | 76000       |\n",
      "| train/                |             |\n",
      "|    entropy_loss       | -43.4       |\n",
      "|    explained_variance | -1.19e-07   |\n",
      "|    learning_rate      | 0.0007      |\n",
      "|    n_updates          | 15199       |\n",
      "|    policy_loss        | 70.4        |\n",
      "|    reward             | -0.18443248 |\n",
      "|    std                | 1.08        |\n",
      "|    value_loss         | 3.46        |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 15300     |\n",
      "|    time_elapsed       | 938       |\n",
      "|    total_timesteps    | 76500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 15299     |\n",
      "|    policy_loss        | 11        |\n",
      "|    reward             | 3.4653084 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 12.2      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 15400     |\n",
      "|    time_elapsed       | 944       |\n",
      "|    total_timesteps    | 77000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 15399     |\n",
      "|    policy_loss        | 71.8      |\n",
      "|    reward             | -6.204252 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 9.22      |\n",
      "-------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 15500    |\n",
      "|    time_elapsed       | 950      |\n",
      "|    total_timesteps    | 77500    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -43.3    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 15499    |\n",
      "|    policy_loss        | 212      |\n",
      "|    reward             | 8.070333 |\n",
      "|    std                | 1.08     |\n",
      "|    value_loss         | 41       |\n",
      "------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 15600     |\n",
      "|    time_elapsed       | 956       |\n",
      "|    total_timesteps    | 78000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.3     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 15599     |\n",
      "|    policy_loss        | 478       |\n",
      "|    reward             | -3.866443 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 115       |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| time/                 |            |\n",
      "|    fps                | 81         |\n",
      "|    iterations         | 15700      |\n",
      "|    time_elapsed       | 963        |\n",
      "|    total_timesteps    | 78500      |\n",
      "| train/                |            |\n",
      "|    entropy_loss       | -43.3      |\n",
      "|    explained_variance | 0.139      |\n",
      "|    learning_rate      | 0.0007     |\n",
      "|    n_updates          | 15699      |\n",
      "|    policy_loss        | 27.7       |\n",
      "|    reward             | -2.0173943 |\n",
      "|    std                | 1.08       |\n",
      "|    value_loss         | 0.994      |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 15800     |\n",
      "|    time_elapsed       | 969       |\n",
      "|    total_timesteps    | 79000     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 15799     |\n",
      "|    policy_loss        | -6.87     |\n",
      "|    reward             | 2.2105525 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 4.71      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| time/                 |           |\n",
      "|    fps                | 81        |\n",
      "|    iterations         | 15900     |\n",
      "|    time_elapsed       | 975       |\n",
      "|    total_timesteps    | 79500     |\n",
      "| train/                |           |\n",
      "|    entropy_loss       | -43.4     |\n",
      "|    explained_variance | 0         |\n",
      "|    learning_rate      | 0.0007    |\n",
      "|    n_updates          | 15899     |\n",
      "|    policy_loss        | 193       |\n",
      "|    reward             | 0.4441461 |\n",
      "|    std                | 1.08      |\n",
      "|    value_loss         | 23.1      |\n",
      "-------------------------------------\n",
      "------------------------------------\n",
      "| time/                 |          |\n",
      "|    fps                | 81       |\n",
      "|    iterations         | 16000    |\n",
      "|    time_elapsed       | 981      |\n",
      "|    total_timesteps    | 80000    |\n",
      "| train/                |          |\n",
      "|    entropy_loss       | -43.4    |\n",
      "|    explained_variance | 0        |\n",
      "|    learning_rate      | 0.0007   |\n",
      "|    n_updates          | 15999    |\n",
      "|    policy_loss        | -84.5    |\n",
      "|    reward             | 2.478934 |\n",
      "|    std                | 1.08     |\n",
      "|    value_loss         | 36.1     |\n",
      "------------------------------------\n"
     ],
     "output_type": "stream"
    }
   ],
   "source": [
    "if if_using_a2c:\n",
    "    agent = DRLAgent(env = env_train)\n",
    "    model_a2c = agent.get_model(\"a2c\")\n",
    "\n",
    "    # set up logger\n",
    "    tmp_path = RESULTS_DIR + '/a2c'\n",
    "    new_logger_a2c = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
    "    # Set new logger\n",
    "    model_a2c.set_logger(new_logger_a2c)\n",
    "    \n",
    "    trained_a2c = agent.train_model(model=model_a2c, \n",
    "                                 tb_log_name='a2c',\n",
    "                                 total_timesteps=50000)\n",
    "\n",
    "    save_path = os.path.join(TRAINED_MODEL_DIR, 'a2c') \n",
    "    trained_a2c.save(os.path.join(save_path, \"agent_a2c.zip\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MRiOtrywfAo1"
   },
   "source": [
    "## Agent 2: DDPG"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "M2YadjfnLwgt",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "outputId": "90405544-f219-431a-8fe4-76bebba1be54",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    " \n",
    "if if_using_ddpg:\n",
    "    agent = DRLAgent(env = env_train)\n",
    "    model_ddpg = agent.get_model(\"ddpg\")\n",
    "\n",
    "    # set up logger\n",
    "    tmp_path = RESULTS_DIR + '/ddpg'\n",
    "    new_logger_ddpg = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
    "    # Set new logger\n",
    "    model_ddpg.set_logger(new_logger_ddpg)\n",
    "    \n",
    "    trained_ddpg = agent.train_model(model=model_ddpg, \n",
    "                             tb_log_name='ddpg',\n",
    "                             total_timesteps=50000) \n",
    "    \n",
    "    save_path = os.path.join(TRAINED_MODEL_DIR, 'ddpg') \n",
    "    trained_ddpg.save(os.path.join(save_path, \"agent_ddpg.zip\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_gDkU-j-fCmZ"
   },
   "source": [
    "### Agent 3: PPO"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "y5D5PFUhMzSV",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "outputId": "f378334f-833f-4ec1-e2ba-f12a0bdebf70",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "PPO_PARAMS = {\n",
    "    \"n_steps\": 2048,\n",
    "    \"ent_coef\": 0.01,\n",
    "    \"learning_rate\": 0.00025,\n",
    "    \"batch_size\": 128,\n",
    "}\n",
    "\n",
    "if if_using_ppo:\n",
    "    agent = DRLAgent(env = env_train)\n",
    "    model_ppo = agent.get_model(\"ppo\",model_kwargs = PPO_PARAMS)\n",
    "\n",
    "    # set up logger\n",
    "    tmp_path = RESULTS_DIR + '/ppo'\n",
    "    new_logger_ppo = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
    "    # Set new logger\n",
    "    model_ppo.set_logger(new_logger_ppo)\n",
    "    \n",
    "    trained_ppo = agent.train_model(model=model_ppo, \n",
    "                             tb_log_name='ppo',\n",
    "                             total_timesteps=200000)\n",
    "    \n",
    "    save_path = os.path.join(TRAINED_MODEL_DIR, 'ppo') \n",
    "    trained_ppo.save(os.path.join(save_path, \"agent_ppo.zip\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3Zpv4S0-fDBv"
   },
   "source": [
    "### Agent 4: TD3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "JSAHhV4Xc-bh",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "outputId": "283fa6a7-1997-4816-8582-a9fcdcfa73d0",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "TD3_PARAMS = {\"batch_size\": 100, \n",
    "              \"buffer_size\": 1000000, \n",
    "              \"learning_rate\": 0.001}\n",
    "\n",
    "\n",
    "if if_using_td3:\n",
    "    agent = DRLAgent(env = env_train)\n",
    "    model_td3 = agent.get_model(\"td3\",model_kwargs = TD3_PARAMS)\n",
    "\n",
    "    # set up logger\n",
    "    tmp_path = RESULTS_DIR + '/td3'\n",
    "    new_logger_td3 = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
    "    # Set new logger\n",
    "    model_td3.set_logger(new_logger_td3)\n",
    "    \n",
    "    trained_td3 = agent.train_model(model=model_td3, \n",
    "                             tb_log_name='td3',\n",
    "                             total_timesteps=50000)\n",
    "    \n",
    "    save_path = os.path.join(TRAINED_MODEL_DIR, 'td3') \n",
    "    trained_td3.save(os.path.join(save_path, \"agent_td3\")) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Dr49PotrfG01"
   },
   "source": [
    "### Agent 5: SAC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "xwOhVjqRkCdM",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "outputId": "f7688355-6ae8-483c-ec33-d202bcab702e",
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "text": [
      "{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}\n",
      "Using cuda device\n",
      "Logging to results\\sac\n",
      "day: 2892, episode: 30\n",
      "begin_total_asset: 1000000.00\n",
      "end_total_asset: 4068845.14\n",
      "total_reward: 3068845.14\n",
      "total_cost: 21184.91\n",
      "total_trades: 43718\n",
      "Sharpe: 0.759\n",
      "=================================\n",
      "-----------------------------------\n",
      "| time/              |            |\n",
      "|    episodes        | 4          |\n",
      "|    fps             | 33         |\n",
      "|    time_elapsed    | 348        |\n",
      "|    total_timesteps | 11572      |\n",
      "| train/             |            |\n",
      "|    actor_loss      | 368        |\n",
      "|    critic_loss     | 151        |\n",
      "|    ent_coef        | 0.0967     |\n",
      "|    ent_coef_loss   | -107       |\n",
      "|    learning_rate   | 0.0001     |\n",
      "|    n_updates       | 11471      |\n",
      "|    reward          | -3.0241768 |\n",
      "-----------------------------------\n",
      "----------------------------------\n",
      "| time/              |           |\n",
      "|    episodes        | 8         |\n",
      "|    fps             | 32        |\n",
      "|    time_elapsed    | 707       |\n",
      "|    total_timesteps | 23144     |\n",
      "| train/             |           |\n",
      "|    actor_loss      | 177       |\n",
      "|    critic_loss     | 33.7      |\n",
      "|    ent_coef        | 0.0307    |\n",
      "|    ent_coef_loss   | -144      |\n",
      "|    learning_rate   | 0.0001    |\n",
      "|    n_updates       | 23043     |\n",
      "|    reward          | 2.0108583 |\n",
      "----------------------------------\n",
      "day: 2892, episode: 40\n",
      "begin_total_asset: 1000000.00\n",
      "end_total_asset: 5350106.89\n",
      "total_reward: 4350106.89\n",
      "total_cost: 10445.33\n",
      "total_trades: 51816\n",
      "Sharpe: 0.889\n",
      "=================================\n",
      "---------------------------------\n",
      "| time/              |          |\n",
      "|    episodes        | 12       |\n",
      "|    fps             | 32       |\n",
      "|    time_elapsed    | 1056     |\n",
      "|    total_timesteps | 34716    |\n",
      "| train/             |          |\n",
      "|    actor_loss      | 93.5     |\n",
      "|    critic_loss     | 21.5     |\n",
      "|    ent_coef        | 0.00984  |\n",
      "|    ent_coef_loss   | -155     |\n",
      "|    learning_rate   | 0.0001   |\n",
      "|    n_updates       | 34615    |\n",
      "|    reward          | 2.77428  |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| time/              |          |\n",
      "|    episodes        | 16       |\n",
      "|    fps             | 32       |\n",
      "|    time_elapsed    | 1403     |\n",
      "|    total_timesteps | 46288    |\n",
      "| train/             |          |\n",
      "|    actor_loss      | 60.2     |\n",
      "|    critic_loss     | 14.7     |\n",
      "|    ent_coef        | 0.00334  |\n",
      "|    ent_coef_loss   | -78.7    |\n",
      "|    learning_rate   | 0.0001   |\n",
      "|    n_updates       | 46187    |\n",
      "|    reward          | 3.231979 |\n",
      "---------------------------------\n",
      "----------------------------------\n",
      "| time/              |           |\n",
      "|    episodes        | 20        |\n",
      "|    fps             | 33        |\n",
      "|    time_elapsed    | 1741      |\n",
      "|    total_timesteps | 57860     |\n",
      "| train/             |           |\n",
      "|    actor_loss      | 37.9      |\n",
      "|    critic_loss     | 5.65      |\n",
      "|    ent_coef        | 0.00139   |\n",
      "|    ent_coef_loss   | 2.27      |\n",
      "|    learning_rate   | 0.0001    |\n",
      "|    n_updates       | 57759     |\n",
      "|    reward          | 3.1411095 |\n",
      "----------------------------------\n",
      "day: 2892, episode: 50\n",
      "begin_total_asset: 1000000.00\n",
      "end_total_asset: 5133030.87\n",
      "total_reward: 4133030.87\n",
      "total_cost: 1418.53\n",
      "total_trades: 49490\n",
      "Sharpe: 0.858\n",
      "=================================\n",
      "---------------------------------\n",
      "| time/              |          |\n",
      "|    episodes        | 24       |\n",
      "|    fps             | 33       |\n",
      "|    time_elapsed    | 2079     |\n",
      "|    total_timesteps | 69432    |\n",
      "| train/             |          |\n",
      "|    actor_loss      | 24.4     |\n",
      "|    critic_loss     | 4.37     |\n",
      "|    ent_coef        | 0.00157  |\n",
      "|    ent_coef_loss   | 4.93     |\n",
      "|    learning_rate   | 0.0001   |\n",
      "|    n_updates       | 69331    |\n",
      "|    reward          | 6.714324 |\n",
      "---------------------------------\n"
     ],
     "output_type": "stream"
    }
   ],
   "source": [
    "# 在单卡GPU下, batch调大显存占用和利用率也上不去，不确定原因\n",
    "SAC_PARAMS = {\n",
    "    \"batch_size\": 128,\n",
    "    \"buffer_size\": 100000,\n",
    "    \"learning_rate\": 0.0001,\n",
    "    \"learning_starts\": 100,\n",
    "    \"ent_coef\": \"auto_0.1\",\n",
    "}\n",
    "\n",
    "if if_using_sac:\n",
    "    agent = DRLAgent(env = env_train)\n",
    "    model_sac = agent.get_model(\"sac\",model_kwargs = SAC_PARAMS)\n",
    "\n",
    "    # set up logger\n",
    "    tmp_path = os.path.join(RESULTS_DIR, 'sac') \n",
    "    new_logger_sac = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
    "    # Set new logger\n",
    "    model_sac.set_logger(new_logger_sac)\n",
    "    \n",
    "    trained_sac = agent.train_model(model=model_sac, \n",
    "                             tb_log_name='sac',\n",
    "                             total_timesteps=70000)\n",
    "\n",
    "    save_path = os.path.join(TRAINED_MODEL_DIR, 'sac') \n",
    "    trained_sac.save(os.path.join(save_path, 'agent_sac.zip'))"
   ]
  }
 ]
}